This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language,\nin opposition to ââ?¬Å?artificial languageââ?¬Â, such as computer programming languages, is the language used by the general public for\ndaily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontologybased\napproaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always\ndetermine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This\npaper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome\nthe addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant\nperformance improvement in sentences/short-texts with arbitrary syntax and structure.
Loading....